<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD><TITLE>Utf manual page - Tcl Library Procedures</TITLE>
<link rel="stylesheet" href="../docs.css" type="text/css" media="all">
</HEAD>
<BODY><H2><a href="../contents.htm">Tcl8.6.11/Tk8.6.11 Documentation</a> <small>&gt;</small> <a href="contents.htm">Tcl C API</a> <small>&gt;</small> Utf</H2>
<H3><A HREF="../UserCmd/contents.htm">Tcl/Tk Applications</A> | <A HREF="../TclCmd/contents.htm">Tcl Commands</A> | <A HREF="../TkCmd/contents.htm">Tk Commands</A> | <A HREF="../ItclCmd/contents.htm">[incr Tcl] Package Commands</A> | <A HREF="../SqliteCmd/contents.htm">SQLite3 Package Commands</A> | <A HREF="../TdbcCmd/contents.htm">TDBC Package Commands</A> | <A HREF="../TdbcmysqlCmd/contents.htm">tdbc::mysql Package Commands</A> | <A HREF="../TdbcodbcCmd/contents.htm">tdbc::odbc Package Commands</A> | <A HREF="../TdbcpostgresCmd/contents.htm">tdbc::postgres Package Commands</A> | <A HREF="../TdbcsqliteCmd/contents.htm">tdbc::sqlite3 Package Commands</A> | <A HREF="../ThreadCmd/contents.htm">Thread Package Commands</A> | <A HREF="../TclLib/contents.htm">Tcl C API</A> | <A HREF="../TkLib/contents.htm">Tk C API</A> | <A HREF="../ItclLib/contents.htm">[incr Tcl] Package C API</A> | <A HREF="../TdbcLib/contents.htm">TDBC Package C API</A></H3>
<DL>
<DD><A HREF="Utf.htm#M2" NAME="L890">NAME</A>
<DL><DD>Tcl_UniChar, Tcl_UniCharToUtf, Tcl_UtfToUniChar, Tcl_UniCharToUtfDString, Tcl_UtfToUniCharDString, Tcl_UniCharLen, Tcl_UniCharNcmp, Tcl_UniCharNcasecmp, Tcl_UniCharCaseMatch, Tcl_UtfNcmp, Tcl_UtfNcasecmp, Tcl_UtfCharComplete, Tcl_NumUtfChars, Tcl_UtfFindFirst, Tcl_UtfFindLast, Tcl_UtfNext, Tcl_UtfPrev, Tcl_UniCharAtIndex, Tcl_UtfAtIndex, Tcl_UtfBackslash &mdash; routines for manipulating UTF-8 strings</DD></DL>
<DD><A HREF="Utf.htm#M3" NAME="L891">SYNOPSIS</A>
<DL>
<DD><B>#include &lt;tcl.h&gt;</B>
<DD>typedef ... <B>Tcl_UniChar</B>;
<DD>int
<DD><B>Tcl_UniCharToUtf</B>(<I>ch, buf</I>)
<DD>int
<DD><B>Tcl_UtfToUniChar</B>(<I>src, chPtr</I>)
<DD>char *
<DD><B>Tcl_UniCharToUtfDString</B>(<I>uniStr, uniLength, dsPtr</I>)
<DD>Tcl_UniChar *
<DD><B>Tcl_UtfToUniCharDString</B>(<I>src, length, dsPtr</I>)
<DD>int
<DD><B>Tcl_UniCharLen</B>(<I>uniStr</I>)
<DD>int
<DD><B>Tcl_UniCharNcmp</B>(<I>ucs, uct, numChars</I>)
<DD>int
<DD><B>Tcl_UniCharNcasecmp</B>(<I>ucs, uct, numChars</I>)
<DD>int
<DD><B>Tcl_UniCharCaseMatch</B>(<I>uniStr, uniPattern, nocase</I>)
<DD>int
<DD><B>Tcl_UtfNcmp</B>(<I>cs, ct, numChars</I>)
<DD>int
<DD><B>Tcl_UtfNcasecmp</B>(<I>cs, ct, numChars</I>)
<DD>int
<DD><B>Tcl_UtfCharComplete</B>(<I>src, length</I>)
<DD>int
<DD><B>Tcl_NumUtfChars</B>(<I>src, length</I>)
<DD>const char *
<DD><B>Tcl_UtfFindFirst</B>(<I>src, ch</I>)
<DD>const char *
<DD><B>Tcl_UtfFindLast</B>(<I>src, ch</I>)
<DD>const char *
<DD><B>Tcl_UtfNext</B>(<I>src</I>)
<DD>const char *
<DD><B>Tcl_UtfPrev</B>(<I>src, start</I>)
<DD>Tcl_UniChar
<DD><B>Tcl_UniCharAtIndex</B>(<I>src, index</I>)
<DD>const char *
<DD><B>Tcl_UtfAtIndex</B>(<I>src, index</I>)
<DD>int
<DD><B>Tcl_UtfBackslash</B>(<I>src, readPtr, dst</I>)
</DL>
<DD><A HREF="Utf.htm#M4" NAME="L892">ARGUMENTS</A>
<DL class="arguments">
</DL>
<DD><A HREF="Utf.htm#M5" NAME="L893">DESCRIPTION</A>
<DD><A HREF="Utf.htm#M6" NAME="L894">KEYWORDS</A>
</DL>
<H3><A NAME="M2">NAME</A></H3>
Tcl_UniChar, Tcl_UniCharToUtf, Tcl_UtfToUniChar, Tcl_UniCharToUtfDString, Tcl_UtfToUniCharDString, Tcl_UniCharLen, Tcl_UniCharNcmp, Tcl_UniCharNcasecmp, Tcl_UniCharCaseMatch, Tcl_UtfNcmp, Tcl_UtfNcasecmp, Tcl_UtfCharComplete, Tcl_NumUtfChars, Tcl_UtfFindFirst, Tcl_UtfFindLast, Tcl_UtfNext, Tcl_UtfPrev, Tcl_UniCharAtIndex, Tcl_UtfAtIndex, Tcl_UtfBackslash &mdash; routines for manipulating UTF-8 strings
<H3><A NAME="M3">SYNOPSIS</A></H3>
<B>#include &lt;tcl.h&gt;</B><BR>
typedef ... <B>Tcl_UniChar</B>;<BR>
int<BR>
<B>Tcl_UniCharToUtf</B>(<I>ch, buf</I>)<BR>
int<BR>
<B>Tcl_UtfToUniChar</B>(<I>src, chPtr</I>)<BR>
char *<BR>
<B>Tcl_UniCharToUtfDString</B>(<I>uniStr, uniLength, dsPtr</I>)<BR>
Tcl_UniChar *<BR>
<B>Tcl_UtfToUniCharDString</B>(<I>src, length, dsPtr</I>)<BR>
int<BR>
<B>Tcl_UniCharLen</B>(<I>uniStr</I>)<BR>
int<BR>
<B>Tcl_UniCharNcmp</B>(<I>ucs, uct, numChars</I>)<BR>
int<BR>
<B>Tcl_UniCharNcasecmp</B>(<I>ucs, uct, numChars</I>)<BR>
int<BR>
<B>Tcl_UniCharCaseMatch</B>(<I>uniStr, uniPattern, nocase</I>)<BR>
int<BR>
<B>Tcl_UtfNcmp</B>(<I>cs, ct, numChars</I>)<BR>
int<BR>
<B>Tcl_UtfNcasecmp</B>(<I>cs, ct, numChars</I>)<BR>
int<BR>
<B>Tcl_UtfCharComplete</B>(<I>src, length</I>)<BR>
int<BR>
<B>Tcl_NumUtfChars</B>(<I>src, length</I>)<BR>
const char *<BR>
<B>Tcl_UtfFindFirst</B>(<I>src, ch</I>)<BR>
const char *<BR>
<B>Tcl_UtfFindLast</B>(<I>src, ch</I>)<BR>
const char *<BR>
<B>Tcl_UtfNext</B>(<I>src</I>)<BR>
const char *<BR>
<B>Tcl_UtfPrev</B>(<I>src, start</I>)<BR>
Tcl_UniChar<BR>
<B>Tcl_UniCharAtIndex</B>(<I>src, index</I>)<BR>
const char *<BR>
<B>Tcl_UtfAtIndex</B>(<I>src, index</I>)<BR>
int<BR>
<B>Tcl_UtfBackslash</B>(<I>src, readPtr, dst</I>)<BR>
<H3><A NAME="M4">ARGUMENTS</A></H3>
<DL class="arguments">
<DT>char <B>*buf</B> (out)<DD>
Buffer in which the UTF-8 representation of the Tcl_UniChar is stored.  At most
<B>TCL_UTF_MAX</B> bytes are stored in the buffer.
<P><DT>int <B>ch</B> (in)<DD>
The Unicode character to be converted or examined.
<P><DT>Tcl_UniChar <B>*chPtr</B> (out)<DD>
Filled with the Tcl_UniChar represented by the head of the UTF-8 string.
<P><DT>const char <B>*src</B> (in)<DD>
Pointer to a UTF-8 string.
<P><DT>const char <B>*cs</B> (in)<DD>
Pointer to a UTF-8 string.
<P><DT>const char <B>*ct</B> (in)<DD>
Pointer to a UTF-8 string.
<P><DT>const Tcl_UniChar <B>*uniStr</B> (in)<DD>
A null-terminated Unicode string.
<P><DT>const Tcl_UniChar <B>*ucs</B> (in)<DD>
A null-terminated Unicode string.
<P><DT>const Tcl_UniChar <B>*uct</B> (in)<DD>
A null-terminated Unicode string.
<P><DT>const Tcl_UniChar <B>*uniPattern</B> (in)<DD>
A null-terminated Unicode string.
<P><DT>int <B>length</B> (in)<DD>
The length of the UTF-8 string in bytes (not UTF-8 characters).  If
negative, all bytes up to the first null byte are used.
<P><DT>int <B>uniLength</B> (in)<DD>
The length of the Unicode string in characters.  Must be greater than or
equal to 0.
<P><DT><A HREF="../TclLib/DString.htm">Tcl_DString</A> <B>*dsPtr</B> (in/out)<DD>
A pointer to a previously initialized <B><A HREF="../TclLib/DString.htm">Tcl_DString</A></B>.
<P><DT>unsigned long <B>numChars</B> (in)<DD>
The number of characters to compare.
<P><DT>const char <B>*start</B> (in)<DD>
Pointer to the beginning of a UTF-8 string.
<P><DT>int <B>index</B> (in)<DD>
The index of a character (not byte) in the UTF-8 string.
<P><DT>int <B>*readPtr</B> (out)<DD>
If non-NULL, filled with the number of bytes in the backslash sequence,
including the backslash character.
<P><DT>char <B>*dst</B> (out)<DD>
Buffer in which the bytes represented by the backslash sequence are stored.
At most <B>TCL_UTF_MAX</B> bytes are stored in the buffer.
<P><DT>int <B>nocase</B> (in)<DD>
Specifies whether the match should be done case-sensitive (0) or
case-insensitive (1).
<P></DL>
<H3><A NAME="M5">DESCRIPTION</A></H3>
These routines convert between UTF-8 strings and Tcl_UniChars.  A
Tcl_UniChar is a Unicode character represented as an unsigned, fixed-size
quantity.  A UTF-8 character is a Unicode character represented as
a varying-length sequence of up to <B>TCL_UTF_MAX</B> bytes.  A multibyte UTF-8
sequence consists of a lead byte followed by some number of trail bytes.
<P>
<B>TCL_UTF_MAX</B> is the maximum number of bytes that it takes to
represent one Unicode character in the UTF-8 representation.
<P>
<B>Tcl_UniCharToUtf</B> stores the Tcl_UniChar <I>ch</I> as a UTF-8 string
in starting at <I>buf</I>.  The return value is the number of bytes stored
in <I>buf</I>.
<P>
<B>Tcl_UtfToUniChar</B> reads one UTF-8 character starting at <I>src</I>
and stores it as a Tcl_UniChar in <I>*chPtr</I>.  The return value is the
number of bytes read from <I>src</I>.  The caller must ensure that the
source buffer is long enough such that this routine does not run off the
end and dereference non-existent or random memory; if the source buffer
is known to be null-terminated, this will not happen.  If the input is
not in proper UTF-8 format, <B>Tcl_UtfToUniChar</B> will store the first
byte of <I>src</I> in <I>*chPtr</I> as a Tcl_UniChar between 0x0080 and
0x00FF and return 1.
<P>
<B>Tcl_UniCharToUtfDString</B> converts the given Unicode string
to UTF-8, storing the result in a previously initialized <B><A HREF="../TclLib/DString.htm">Tcl_DString</A></B>.
You must specify <I>uniLength</I>, the length of the given Unicode string.
The return value is a pointer to the UTF-8 representation of the
Unicode string.  Storage for the return value is appended to the
end of the <B><A HREF="../TclLib/DString.htm">Tcl_DString</A></B>.
<P>
<B>Tcl_UtfToUniCharDString</B> converts the given UTF-8 string to Unicode,
storing the result in the previously initialized <B><A HREF="../TclLib/DString.htm">Tcl_DString</A></B>.
In the argument <I>length</I>, you may either specify the length of
the given UTF-8 string in bytes or
&ldquo;-1&rdquo;,
in which case <B>Tcl_UtfToUniCharDString</B> uses <B>strlen</B> to
calculate the length.  The return value is a pointer to the Unicode
representation of the UTF-8 string.  Storage for the return value
is appended to the end of the <B><A HREF="../TclLib/DString.htm">Tcl_DString</A></B>.  The Unicode string
is terminated with a Unicode null character.
<P>
<B>Tcl_UniCharLen</B> corresponds to <B>strlen</B> for Unicode
characters.  It accepts a null-terminated Unicode string and returns
the number of Unicode characters (not bytes) in that string.
<P>
<B>Tcl_UniCharNcmp</B> and <B>Tcl_UniCharNcasecmp</B> correspond to
<B>strncmp</B> and <B>strncasecmp</B>, respectively, for Unicode characters.
They accept two null-terminated Unicode strings and the number of characters
to compare.  Both strings are assumed to be at least <I>numChars</I> characters
long. <B>Tcl_UniCharNcmp</B>  compares the two strings character-by-character
according to the Unicode character ordering.  It returns an integer greater
than, equal to, or less than 0 if the first string is greater than, equal
to, or less than the second string respectively.  <B>Tcl_UniCharNcasecmp</B>
is the Unicode case insensitive version.
<P>
<B>Tcl_UniCharCaseMatch</B> is the Unicode equivalent to
<B><A HREF="../TclLib/StrMatch.htm">Tcl_StringCaseMatch</A></B>.  It accepts a null-terminated Unicode string,
a Unicode pattern, and a boolean value specifying whether the match should
be case sensitive and returns whether the string matches the pattern.
<P>
<B>Tcl_UtfNcmp</B> corresponds to <B>strncmp</B> for UTF-8 strings. It
accepts two null-terminated UTF-8 strings and the number of characters
to compare.  (Both strings are assumed to be at least <I>numChars</I>
characters long.)  <B>Tcl_UtfNcmp</B> compares the two strings
character-by-character according to the Unicode character ordering.
It returns an integer greater than, equal to, or less than 0 if the
first string is greater than, equal to, or less than the second string
respectively.
<P>
<B>Tcl_UtfNcasecmp</B> corresponds to <B>strncasecmp</B> for UTF-8
strings.  It is similar to <B>Tcl_UtfNcmp</B> except comparisons ignore
differences in case when comparing upper, lower or title case
characters.
<P>
<B>Tcl_UtfCharComplete</B> returns 1 if the source UTF-8 string <I>src</I>
of <I>length</I> bytes is long enough to be decoded by
<B>Tcl_UtfToUniChar</B>, or 0 otherwise.  This function does not guarantee
that the UTF-8 string is properly formed.  This routine is used by
procedures that are operating on a byte at a time and need to know if a
full Tcl_UniChar has been seen.
<P>
<B>Tcl_NumUtfChars</B> corresponds to <B>strlen</B> for UTF-8 strings.  It
returns the number of Tcl_UniChars that are represented by the UTF-8 string
<I>src</I>.  The length of the source string is <I>length</I> bytes.  If the
length is negative, all bytes up to the first null byte are used.
<P>
<B>Tcl_UtfFindFirst</B> corresponds to <B>strchr</B> for UTF-8 strings.  It
returns a pointer to the first occurrence of the Tcl_UniChar <I>ch</I>
in the null-terminated UTF-8 string <I>src</I>.  The null terminator is
considered part of the UTF-8 string.
<P>
<B>Tcl_UtfFindLast</B> corresponds to <B>strrchr</B> for UTF-8 strings.  It
returns a pointer to the last occurrence of the Tcl_UniChar <I>ch</I>
in the null-terminated UTF-8 string <I>src</I>.  The null terminator is
considered part of the UTF-8 string.
<P>
Given <I>src</I>, a pointer to some location in a UTF-8 string,
<B>Tcl_UtfNext</B> returns a pointer to the next UTF-8 character in the
string.  The caller must not ask for the next character after the last
character in the string if the string is not terminated by a null
character.
<P>
<B>Tcl_UtfPrev</B> is used to step backward through but not beyond the
UTF-8 string that begins at <I>start</I>.  If the UTF-8 string is made
up entirely of complete and well-formed characters, and <I>src</I> points
to the lead byte of one of those characters (or to the location one byte
past the end of the string), then repeated calls of <B>Tcl_UtfPrev</B> will
return pointers to the lead bytes of each character in the string, one
character at a time, terminating when it returns <I>start</I>.
<P>
When the conditions of completeness and well-formedness may not be satisfied,
a more precise description of the function of <B>Tcl_UtfPrev</B> is necessary.
It always returns a pointer greater than or equal to <I>start</I>; that is,
always a pointer to a location in the string. It always returns a pointer to
a byte that begins a character when scanning for characters beginning
from <I>start</I>. When <I>src</I> is greater than <I>start</I>, it
always returns a pointer less than <I>src</I> and greater than or
equal to (<I>src</I> - <B>TCL_UTF_MAX</B>).  The character that begins
at the returned pointer is the first one that either includes the
byte <I>src[-1]</I>, or might include it if the right trail bytes are
present at <I>src</I> and greater. <B>Tcl_UtfPrev</B> never reads the
byte <I>src[0]</I> nor the byte <I>start[-1]</I> nor the byte
<I>src[-</I><B>TCL_UTF_MAX</B><I>-1]</I>.
<P>
<B>Tcl_UniCharAtIndex</B> corresponds to a C string array dereference or the
Pascal Ord() function.  It returns the Tcl_UniChar represented at the
specified character (not byte) <I>index</I> in the UTF-8 string
<I>src</I>.  The source string must contain at least <I>index</I>
characters.  Behavior is undefined if a negative <I>index</I> is given.
<P>
<B>Tcl_UtfAtIndex</B> returns a pointer to the specified character (not
byte) <I>index</I> in the UTF-8 string <I>src</I>.  The source string must
contain at least <I>index</I> characters.  This is equivalent to calling
<B>Tcl_UtfToUniChar</B> <I>index</I> times.  If a negative <I>index</I> is given,
the return pointer points to the first character in the source string.
<P>
<B>Tcl_UtfBackslash</B> is a utility procedure used by several of the Tcl
commands.  It parses a backslash sequence and stores the properly formed
UTF-8 character represented by the backslash sequence in the output
buffer <I>dst</I>.  At most <B>TCL_UTF_MAX</B> bytes are stored in the buffer.
<B>Tcl_UtfBackslash</B> modifies <I>*readPtr</I> to contain the number
of bytes in the backslash sequence, including the backslash character.
The return value is the number of bytes stored in the output buffer.
<P>
See the <B><A HREF="../TclCmd/Tcl.htm">Tcl</A></B> manual entry for information on the valid backslash
sequences.  All of the sequences described in the <A HREF="../TclCmd/Tcl.htm">Tcl</A> manual entry are
supported by <B>Tcl_UtfBackslash</B>.

<H3><A NAME="M6">KEYWORDS</A></H3>
<A href="../Keywords/U.htm#utf">utf</A>, <A href="../Keywords/U.htm#unicode">unicode</A>, <A href="../Keywords/B.htm#backslash">backslash</A>
<div class="copy">Copyright &copy; 1997 Sun Microsystems, Inc.
</div>
</BODY></HTML>
